Preprocessing of gene-expression data related to breast cancer diagnosis
نویسندگان
چکیده
The work is performed in close cooperation with the University of Tromsø and professor Eiliv Lund and is financed by the ERC TICE project. This note describes the preprocessing steps of gene expression data and focuses particularly on the filtering and normalization steps as the choices made here greatly affects the set of probes used in later analyses. In the filtering step, two parameters are set. Firstly, a cut-off for the detection p-value for each probe is set, and a probe is present in a given sample if its detection p-value is smaller than this cut-off. Secondly, the present limit is set. It is used to decide in how many samples a probe has to be present in order to be included in the dataset. The results show that a p-value cut-off at 0.01 and a present limit at 0.01 are reasonable choices. After filtering, the data can be normalized. Four different approaches are evaluated, and for the available dataset, quantile normalization of the data on original scale gives the most stable results.
منابع مشابه
Identification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis
Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...
متن کاملDiagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data
Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...
متن کاملEvaluation of PALB2 Gene Expression in Breast Cancer
Breast cancer is the most prevalent malignancy and the second in mortality rate cancers among all cancers in women globally. In Iran it adds up to 16 percent of all cancers and is the most prevalent cancer in Iranian women. PALB2, identified as a partner and localizer of BRCA2 and proposed to have some roles in DNA damage response, has recently been identified as a breast canc...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملPTEN Gene Expression and Its Association with rs10490920 SNP in Breast Cancer
Introduction: The PTEN gene, also known as MMAC1 or TEP1, is a tumor suppressor gene. One of the important polymorphisms of this gene is the rs10490920 SNP. The purpose of this study was to determine the PTEN gene expression and its relation to changes in rs10490920 polymorphism in breast cancer. Methods: In this study, 40 breast cancer patients and 10 healthy controls were considered. The expr...
متن کاملEffect of topotycan and zinc oxide nanoparticles combination on cytotoxicity and P53 gene expression against breast cancer (MCF-7) cell line
Introduction: Breast cancer is one of the most common malignancies in women worldwide. Today, nanoparticles are one of the hopes of treatment and diagnosis of many diseases, including cancer. the The present study aimed to explore the effect of topotycan and zinc oxide nanoparticles (ZnONPs) combination on cytotoxicity and P53 gene expression in MCF7 breast cancer cells. Materials and Methods: ...
متن کامل